The following analyses aim to describe the sleep habits of a sample of undergraduate students (N = 82) and to evaluate the role of demographical, dispositional and contextual predictors. Since data are hierarchically structured into two levels (Level 1 = night, Level 2 = student), these goals are pursued using linear mixed-effects regression (LMER) models.
Specifically, the analyses are conducted following a hierarchical regression and a model comparison approach, where the predictors of interest are subsequently included in more parsimonious models, starting from a null model (with no predictors). Each model is compared to the previous one in terms of data reproducibility (likelihood ratio test, LRT) and plausibility (Aikake Information Criterion, AIC).
The analyses are performed separately for each considered sleep measure: total sleep time (TST), sleep efficiency (SE), sleep onset latency (SOL), wake after sleep onset (WASO), bedtime (BT), and wake time, (WT). The predictors of interest are included in the following order:
day of week: differences between weekdays and weekend are analyzed, hypothesizing shorter TST and earlier BT and WT during the weekend
sex: differences between females and males are analyzed, hypothesizing earlier bedtime and longer TST for females
circadian preferences: higher MEQ.r scores are hypothesized to predict earlier bed (BT) and wake time (WT)
perceived sleep disturbances: higher PSQI scores hypothesized to predict lower TST and SE, and longer SOL and WASO
depressive symptoms: higher BDI scores are hypothesized to predict more sleep problems, similarly to the above-mentioned hypothesis
students’ nap frequency: daily daytime nap as recorded with actigraphy using a set of rules modified from Patel and colleagues (2015)
students’ habits: participants reporting to smoke, or to drink coffé and alcohol more frequently are hypothesized to show more sleep problems, similarly to the above-mentioned hypothesis
The document is structured as follows:
the dataset is loaded and converted from a wide form (one row per participant) to a long form (one row per night)
descriptives and intraclass correlation coefficients (ICCs) are computed for each sleep measure
a graphical inspection is performed for all sleep measures
the goodness of fit is assessed for each sleep measure to check the suitability of the normal and alternative distributions
an influential analysis is performed to assess the presence of participants influencing the parameters estimation
LMER models comparison is finally performed per each measure and the results of the selected model are discussed
The first step is to load the dataset and to transform it from a wide form (one row per participant) to a long form (one row per night) to be used with LMER models.
rm(list=ls())
library(tidyr); library(textclean); library(dplyr)
# sleep <- read.csv2("Rechecked BT WT all 26.7.19.csv")
sleep <- read.csv2("Rechecked_nap.csv")
# some numeric variables is erroneously read as factor or character
sleep[,13:ncol(sleep)] <- lapply(sleep[,13:ncol(sleep)], as.character)
sleep[,13:ncol(sleep)] <- lapply(sleep[,13:ncol(sleep)], as.numeric)
sleep$AGE <- as.numeric(as.character(sleep$AGE))
sleep$CAFFE_week <- as.numeric(as.character(sleep$CAFFE_week))
sleep$FUMO_Giorno <- as.numeric(as.character(sleep$FUMO_Giorno))
sleep$ALCOL_week <- as.numeric(as.character(sleep$ALCOL_week))
# from wide to long x each variable
sleep.long <- gather(data=sleep,day,BT,BT_LUNEDI:BT_DOMENICA)
sleep.long$WT <- gather(data=sleep,day,WT,WT_LUNEDI:WT_DOMENICA)[,117]
sleep.long$TIB <- gather(data=sleep,day,TIB,TIB_LUNEDI:TIB_DOMENICA)[,117]
sleep.long$TST <- gather(data=sleep,day,TST,TST_LUNEDI:TST_DOMENICA)[,117]
sleep.long$SOL <- gather(data=sleep,day,SOL,SOL_LUNEDI:SOL_DOMENICA)[,117]
sleep.long$WAKE <- gather(data=sleep,day,WAKE,WAKE_LUNEDI:WAKE_DOMENICA)[,117]
sleep.long$SE <- as.numeric(gather(data=sleep,day,SE,SE_LUNEDI:SE_DOMENICA)[,117])
sleep.long$WASO <- gather(data=sleep,day,WASO,WASO_LUNEDI:WASO_DOMENICA)[,117]
sleep.long$NAP <- gather(data=sleep,day,NAP,NAP_LUNEDI:NAP_DOMENICA)[,117]
# converting categorical predictors as factor
sleep.long[c("FUMO..SI.NO.","ALCOL..SI.NO.","CAFFE..SI.NO.",
"NAP")] <- lapply(sleep.long[c("FUMO..SI.NO.","ALCOL..SI.NO.","CAFFE..SI.NO.",
"NAP")], factor)
# converting two cases of NAP = 2 into NAP = 1
summary(sleep.long$NAP)
## 0 1 2
## 485 87 2
sleep.long[sleep.long$NAP==2,"NAP"] <- 1
sleep.long$NAP <- as.factor(as.character(sleep.long$NAP))
# recoding days
sleep.long$day <- mgsub(mgsub(sleep.long$day,
c("BT","WT","TIB","TST","SOL","WAKE","SE","WASO","_"),
rep("",9)),
c("LUNEDI","MARTEDI","MERCOLEDI","GIOVEDI","VENERDI","SABATO","DOMENICA"),
c( 1, 2, 3, 4, 5, 6, 7))
sleep.long$day <- as.factor(sleep.long$day)
# order by ID and day
sleep.long <- sleep.long[order(sleep.long$ID,sleep.long$day),]
rownames(sleep.long) <- 1:nrow(sleep.long)
sleep.long[,c("ID","day","BT","WT","TIB","TST","SE","SOL","WAKE","SE","WASO")]
# sanity check
sleep.long[1,"TST"] == sleep[sleep$ID=="EM01","TST_LUNEDI"]
## [1] TRUE
# sanity check
sleep.long[sleep.long$ID=="EM04" & sleep.long$day==3,"SE"] == sleep[sleep$ID=="EM04","SE_MERCOLEDI"]
## [1] TRUE
# sanity check
sleep.long[sleep.long$ID=="MZ55" & sleep.long$day==5,"WASO"] == sleep[sleep$ID=="MZ55","WASO_VENERDI"]
## [1] TRUE
# take only columns of interest
sleep.long <- sleep.long[,c("ID","day","BT","WT","TIB","TST","SOL","WAKE","SE","WASO","NAP",
colnames(sleep.long)[c(5:23,30:43)])]
Comments:
Then, we create the new variable week.time, with value “wd” for weekdays (Sunday to Thursday) and “we” for weekend (Friday to Saturday).
sleep.long$week.time <- "wd"
sleep.long[sleep.long$day==5|sleep.long$day==6,"week.time"] <- "we"
sleep.long$week.time <- as.factor(sleep.long$week.time)
sleep.long <- sleep.long[,c("ID","day","week.time",
colnames(sleep.long)[3:(ncol(sleep.long)-1)])]
# sanity check
sleep.long[,c("ID","day","week.time")]
Then, we re-codify Wake Time (WT, i.e., the hours between midnight and lights on), which is currently referred to the subsequent day (e.g., Friday WT is Saturday WT, Saturday WT is Sunday WT).
# adjusting wake time (WT) - expressed as hours from 00:00 --> referred to the subsequent day!
WT <- sleep.long[,c("ID","day","WT")]
WT <- WT %>%
group_by(ID) %>%
mutate(WT.lag = dplyr::lead(WT, n = 1))
# make SUNDAY WT as that previously indicated on SATURDAY
IDs <- levels(as.factor(WT$ID))
for(ID in IDs){
WT[WT$ID==ID & WT$day==7,"WT.lag"] <- WT[WT$ID==ID & WT$day==1,"WT"]
}
WT # sanity check
sleep.long$WT.dayAfter <- as.numeric(WT$WT.lag)
mean(na.omit(sleep.long[sleep.long$week.time=="wd","WT.dayAfter"]))
## [1] 8.516297
mean(na.omit(sleep.long[sleep.long$week.time=="we","WT.dayAfter"]))
## [1] 9.398926
Finally, we express TST and TIB in hours, instead of minutes.
sleep.long$TST <- sleep.long$TST/60
sleep.long$TIB <- sleep.long$TIB/60
Here the meaning of the variables of interest is explained.
sleep.long
ID is the identification code of each participant
day is the number of the night where sleep measures are recorded in a given participant. It ranges from 1 (Monday) to 7 (Sunday)
week.time indicates if the considered day is a weekday (wd) or weekend (we)
BT is bedtime, encoded as the number of hours between the previous midnight and the time when participants started sleeping
WT is wake time, encoded as the number of hours between the previous midnight and the time when participants woke up
WT.dayAfter is WT referred to the following day
TIB is total time in bed, encoded as the minutes between BT and WT
TST is total sleep time, encoded as the minutes of effective sleep between BT and WT
SE is sleep efficiency, eexpressed as the percentage of sleep time over TIB (SE = 100*TST/TIB)
SOL is sleep onset latency, encoded as the minutes between BT and the first sleep epoch (sleep onset)
WASO is wake after sleep onset, encoded as the total minutes of wake after sleep onset
NAP indicates if a nap was taken (1) or not (0) by a given participant in a given day
SEX is participants sex
CAFFE_week is the average number of coffees consumed by a given participant each week
FUMO..SI.NO. indicates if a given participant is a smoker (N = 24, 29%) or not
ALCOL_WEEK is the average number of alcoholic units consumed by a given participant each week
MEQ.R is the MEQ-r score of a given participant, indicating his/her circadian preferences
BDI.II is the BDI-II score of a given participant, indicating his/her depressive symptoms
PSQI is the PSQI score of a given participant, indicating his/her preceived sleep problems
Here, our goal is to describe the variables of interest with a focus on intra-individual variability. Specifically, we compute the intraclass correlation coefficients (ICCs) to estimate how the variance is distributed on both the within and the between levels. The ICCs range from 0 (all variance is within participants) to 1 (all variance is between participants).
library(ICC); library(Rmisc); library(knitr); library(dplyr); library(ggplot2); library(knitr)
windowsFonts(CMU=windowsFont("CMU Serif Roman"),time=windowsFont("Times New Roman"))
desc.table <- data.frame(measure=as.character(c("BT","WT.dayAfter","TIB","TST","SE","SOL","WASO")),
mean=rep(NA,7),sd=rep(NA,7),
mean_F=rep(NA,7),sd_F=rep(NA,7),mean_M=rep(NA,7),sd_M=rep(NA,7),
mean_WD=rep(NA,7),sd_WD=rep(NA,7),mean_WE=rep(NA,7),sd_WD=rep(NA,7),ICC=rep(NA,7))
desc.table$measure <- as.character(desc.table$measure)
for(i in 1:nrow(desc.table)){
desc.table[i,2:3] <- round(Rmisc::summarySE(sleep.long, measurevar=desc.table[i,"measure"],
na.rm=TRUE)[3:4],2)
desc.table[i,4:5] <- round(Rmisc::summarySE(sleep.long, measurevar=desc.table[i,"measure"],
groupvars="SEX",na.rm=TRUE)[1,3:4],2)
desc.table[i,6:7] <- round(Rmisc::summarySE(sleep.long, measurevar=desc.table[i,"measure"],
groupvars="SEX",na.rm=TRUE)[2,3:4],2)
desc.table[i,8:9] <- round(Rmisc::summarySE(sleep.long, measurevar=desc.table[i,"measure"],
groupvars="week.time",na.rm=TRUE)[1,3:4],2)
desc.table[i,10:11] <- round(Rmisc::summarySE(sleep.long, measurevar=desc.table[i,"measure"],
groupvars="week.time",na.rm=TRUE)[2,3:4],2)
data4ICC <- sleep.long[,c("ID",desc.table[i,"measure"])]
desc.table[i,12] <- round(ICCest(x=ID,y=desc.table[i,"measure"],data=na.omit(data4ICC))$'ICC',2)
}
kable(desc.table,digits=2, format="pandoc", caption="Table 1: Descriptive Statistics of level-1 variables")
| measure | mean | sd | mean_F | sd_F | mean_M | sd_M | mean_WD | sd_WD | mean_WE | sd_WD.1 | ICC |
|---|---|---|---|---|---|---|---|---|---|---|---|
| BT | 25.19 | 1.72 | 24.96 | 1.64 | 25.46 | 1.79 | 24.99 | 1.55 | 25.68 | 2.02 | 0.38 |
| WT.dayAfter | 8.77 | 1.49 | 8.73 | 1.41 | 8.82 | 1.59 | 8.52 | 1.40 | 9.40 | 1.54 | 0.32 |
| TIB | 7.59 | 1.40 | 7.78 | 1.46 | 7.37 | 1.30 | 7.53 | 1.38 | 7.76 | 1.44 | 0.14 |
| TST | 6.56 | 1.28 | 6.78 | 1.34 | 6.30 | 1.16 | 6.50 | 1.25 | 6.72 | 1.33 | 0.20 |
| SE | 86.41 | 5.77 | 87.12 | 5.22 | 85.58 | 6.26 | 86.34 | 5.78 | 86.56 | 5.74 | 0.64 |
| SOL | 8.82 | 13.48 | 8.45 | 15.46 | 9.25 | 10.72 | 8.91 | 14.27 | 8.60 | 11.33 | 0.24 |
| WASO | 52.98 | 27.27 | 51.56 | 25.26 | 54.66 | 29.41 | 52.69 | 27.81 | 53.71 | 25.96 | 0.53 |
Comments:
descriptive statistics suggest that on average participants fall asleep around 01:00 a.m. and wake up around 8:45 a.m. On average, they spend 7.5 hours in bed, of which only 6.5 hours are effective sleep, with an average sleep efficiency of 86.42%. The average sleep onset latency is 9 minutes and the average wake time after sleep onset is one hour (both measures show high variability between participants).
descriptively, we observe no evident differences between weekdays and weekends, with the exception of TIB and TST.
all ICCs but those of WASO and SE are below .50, indicating that most variability in sleep measures is intra-individual variability, and thus low sleep consistency in the sample. The more stable sleep measures are WASO and SE, whereas the more varying measure, is TIB.
We also create an extended version of this table, showing means and SD by gender and week day.
desc.table2 <- data.frame(measure=rep(c("BT","WT.dayAfter","TIB","TST","SE","SOL","WASO"),each=2),
sex=rep(c("F","M"),7),
MONmean=rep(NA,14),MONsd=rep(NA,14),TUEmean=rep(NA,14),TUEsd=rep(NA,14),
WEDmean=rep(NA,14),WEDsd=rep(NA,14),THUmean=rep(NA,14),THUsd=rep(NA,14),
FRYmean=rep(NA,14),FRYsd=rep(NA,14),SATmean=rep(NA,14),SATsd=rep(NA,14),
SUNmean=rep(NA,14),SUNsd=rep(NA,14))
for(i in 1:nrow(desc.table2)){
stats <- Rmisc::summarySE(sleep.long,measurevar=as.character(desc.table2[i,"measure"]),
groupvars=c("SEX","day"),na.rm=TRUE)
desc.table2[i,c("MONmean","MONsd")] <- round(stats[stats$day==1&stats$SEX==desc.table2[i,"sex"],4:5],2)
desc.table2[i,c("TUEmean","TUEsd")] <- round(stats[stats$day==2&stats$SEX==desc.table2[i,"sex"],4:5],2)
desc.table2[i,c("WEDmean","WEDsd")] <- round(stats[stats$day==3&stats$SEX==desc.table2[i,"sex"],4:5],2)
desc.table2[i,c("THUmean","THUsd")] <- round(stats[stats$day==4&stats$SEX==desc.table2[i,"sex"],4:5],2)
desc.table2[i,c("FRYmean","FRYsd")] <- round(stats[stats$day==5&stats$SEX==desc.table2[i,"sex"],4:5],2)
desc.table2[i,c("SATmean","SATsd")] <- round(stats[stats$day==6&stats$SEX==desc.table2[i,"sex"],4:5],2)
desc.table2[i,c("SUNmean","SUNsd")] <- round(stats[stats$day==7&stats$SEX==desc.table2[i,"sex"],4:5],2)
}
kable(desc.table2,digits=2, format="pandoc", caption="Table 1: Descriptive Statistics by gender and day")
| measure | sex | MONmean | MONsd | TUEmean | TUEsd | WEDmean | WEDsd | THUmean | THUsd | FRYmean | FRYsd | SATmean | SATsd | SUNmean | SUNsd |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BT | F | 24.70 | 1.42 | 24.66 | 1.06 | 24.85 | 1.44 | 24.92 | 1.51 | 25.19 | 1.58 | 25.75 | 2.42 | 24.63 | 1.52 |
| BT | M | 25.09 | 1.47 | 25.16 | 1.41 | 25.37 | 1.74 | 25.16 | 1.36 | 25.81 | 2.11 | 26.06 | 1.82 | 25.55 | 2.26 |
| WT.dayAfter | F | 8.28 | 1.15 | 8.11 | 1.09 | 8.68 | 1.25 | 8.86 | 1.60 | 9.11 | 1.51 | 9.47 | 1.43 | 8.60 | 1.34 |
| WT.dayAfter | M | 8.49 | 1.28 | 8.17 | 1.59 | 8.50 | 1.44 | 8.49 | 1.30 | 9.34 | 1.82 | 9.72 | 1.37 | 8.97 | 1.79 |
| TIB | F | 7.48 | 1.56 | 7.51 | 1.31 | 7.93 | 1.17 | 7.76 | 1.73 | 8.01 | 1.40 | 7.88 | 1.53 | 7.91 | 1.43 |
| TIB | M | 7.38 | 1.24 | 7.03 | 1.13 | 7.10 | 1.18 | 7.38 | 1.35 | 7.47 | 1.25 | 7.63 | 1.55 | 7.60 | 1.34 |
| TST | F | 6.54 | 1.49 | 6.53 | 1.16 | 6.86 | 1.01 | 6.78 | 1.54 | 7.02 | 1.42 | 6.85 | 1.33 | 6.92 | 1.33 |
| TST | M | 6.23 | 1.10 | 6.00 | 1.01 | 6.10 | 0.91 | 6.30 | 1.16 | 6.46 | 1.15 | 6.50 | 1.35 | 6.49 | 1.34 |
| SE | F | 87.14 | 5.22 | 87.06 | 5.21 | 86.58 | 4.52 | 87.28 | 5.05 | 87.29 | 7.00 | 87.08 | 4.56 | 87.39 | 4.89 |
| SE | M | 84.67 | 6.55 | 85.54 | 6.89 | 86.44 | 6.00 | 85.54 | 5.73 | 86.45 | 5.51 | 85.25 | 5.58 | 85.15 | 7.57 |
| SOL | F | 9.05 | 12.18 | 7.98 | 10.48 | 9.41 | 19.69 | 10.14 | 27.77 | 8.16 | 10.53 | 7.66 | 10.23 | 6.80 | 8.14 |
| SOL | M | 8.78 | 8.75 | 8.08 | 7.78 | 7.97 | 9.17 | 10.16 | 11.39 | 8.53 | 14.03 | 10.29 | 10.67 | 10.89 | 12.23 |
| WASO | F | 47.91 | 21.91 | 50.91 | 24.70 | 54.95 | 23.43 | 48.95 | 29.15 | 50.98 | 24.33 | 54.16 | 28.41 | 53.02 | 24.99 |
| WASO | M | 59.81 | 33.55 | 53.76 | 32.51 | 51.78 | 26.64 | 54.37 | 27.03 | 52.55 | 25.30 | 57.50 | 26.05 | 52.87 | 34.80 |
Finally, we can take a look at the variables on level 2 (between-individuals):
gridExtra::grid.arrange(ggplot(sleep,aes(SEX))+geom_bar()+ggtitle("SEX"),
ggplot(sleep,aes(round(AGE,0)))+geom_bar()+ggtitle("AGE"),
ggplot(sleep,aes(FUMO..SI.NO.))+geom_bar()+ggtitle("SMOKE"),
ggplot(sleep,aes(ALCOL..SI.NO.))+geom_bar()+ggtitle("ALCOHOL"))
desc2 <- rbind(
sleep[sleep$CAFFE..SI.NO.==1,] %>%
dplyr::select(CAFFE_week) %>%
gather("Variable", "value") %>%
group_by(Variable) %>%
summarise(Mean=round(mean(value, na.rm=TRUE),2),
SD=round(sd(value, na.rm=TRUE),2),
min=round(min(value, na.rm=TRUE), 2),
max=round(max(value, na.rm=TRUE),2),
N=length(which(!is.na(value)))),
sleep[sleep$FUMO..SI.NO.==1,] %>%
dplyr::select(FUMO_Giorno) %>%
gather("Variable", "value") %>%
group_by(Variable) %>%
summarise(Mean=round(mean(value, na.rm=TRUE),2),
SD=round(sd(value, na.rm=TRUE),2),
min=round(min(value, na.rm=TRUE), 2),
max=round(max(value, na.rm=TRUE),2),
N=length(which(!is.na(value)))),
sleep[sleep$ALCOL..SI.NO.==1,] %>%
dplyr::select(ALCOL_week) %>%
gather("Variable", "value") %>%
group_by(Variable) %>%
summarise(Mean=round(mean(value, na.rm=TRUE),2),
SD=round(sd(value, na.rm=TRUE),2),
min=round(min(value, na.rm=TRUE), 2),
max=round(max(value, na.rm=TRUE),2),
N=length(which(!is.na(value)))),
sleep %>%
dplyr::select(c(MEQ.r:PSQI)) %>%
gather("Variable", "value") %>%
group_by(Variable) %>%
summarise(Mean=round(mean(value, na.rm=TRUE),2),
SD=round(sd(value, na.rm=TRUE),2),
min=round(min(value, na.rm=TRUE), 2),
max=round(max(value, na.rm=TRUE),2),
N=length(which(!is.na(value)))))
kable(desc2,digits=2, format="pandoc", caption="Table 2: Descriptive Statistics of level-2 variables")
| Mean | SD | min | max | N |
|---|---|---|---|---|
| 12.43 | 7.46 | 1.0 | 35 | 66 |
| 6.54 | 5.33 | 1.0 | 20 | 24 |
| 4.08 | 3.65 | 0.5 | 15 | 72 |
| 9.84 | 5.92 | 0.0 | 38 | 246 |
gridExtra::grid.arrange(ggplot(sleep,aes(BDI.II))+geom_bar()+ggtitle("BDI-II"),
ggplot(sleep,aes(round(PSQI,0)))+geom_bar()+ggtitle("PSQI"),
ggplot(sleep,aes(MEQ.r))+geom_bar()+ggtitle("MEQ.r"),nrow=3)
ggplot(sleep,aes(x=PSQI,fill=SEX))+
geom_histogram(color="white",position="identity",binwidth = 1)+
geom_vline(xintercept=5.5,lty=2,lwd=1)+
scale_fill_manual(name="Gender",
values=c("lightgray",rgb(.08, .08, .08,alpha=.5)),
labels=c("Women","Men"))+
theme_classic()+labs(x="PSQI scores",y="Frequency")+
theme(text=element_text(size=20,face="bold",family="time"))
Comments:
the sample is gender-balanced, but it is mainly composed by non-smokers (N = 58, 70%) and alcohol users (N = 72, 88%).
the sample is also homogeneous in terms of age, with the exception of one 35-year-old participant (could be an influential case!)
on average, participants consume 12 coffees per week, with smokers reporting to smoke an average number of 6.5 cigarettes per day and alcohol users reporting to drink an average number of 4 alcoholic units per week
the average BDI-II score is 10.22, which is below the suggested cutoff of 13. Only 22 participants (27%) show a score higher than 13.
the average MEQ-r score is 13.23, suggesting that the average participant reports an “intermediate” circadian preference, with 6 (7%) “early birds” (preferring earlier bed and wake time) and 22 (27%) “night owls” (preferring later bed and wake time)
the average PSQI score is 6.06, which is slightly above the suggested cut-off of 5, suggesting the presence of a relatively high number of participants reporting sleep disturbances. Indeed, 44 participants (54%) have a PSQI score higher than 5
Here, the variables’ distributions are further examined by a graphical inspection. A first data is related to sleep consistency and it can be shown in terms of the standard deviation of each considered sleep metric.
library(ggplot2); library(extrafont); library(cowplot); library(Rmisc)
# function for geom_flat_violin
source("https://gist.githubusercontent.com/benmarwick/2a1bb0133ff568cbe28d/raw/fb53bd97121f7f9ce947837ef1a4c65a73bffb3f/geom_flat_violin.R")
# metrics standard deviations
gridExtra::grid.arrange(ggplot(sleep,aes(x=WT_SD))+
geom_histogram(bins=40) + ggtitle("Wake Time") + xlab("St. Dev") +
theme(text=element_text(family="CMU")),
ggplot(sleep,aes(x=BT_SD))+
geom_histogram(bins=40) + ggtitle("Bed Time") + xlab("St. Dev") +
theme(text=element_text(family="CMU")),
ggplot(sleep,aes(x=TIB_SD))+
geom_histogram(bins=40) + ggtitle("Time in Bed") + xlab("St. Dev")+
theme(text=element_text(family="CMU")),
ggplot(sleep,aes(x=TST_SD))+
geom_histogram(bins=40) + ggtitle("Total Sleep Time") + xlab("St. Dev")+
theme(text=element_text(family="CMU")),
ggplot(sleep,aes(x=SOL_SD))+
geom_histogram(bins=40) + ggtitle("Sleep Onset Time") + xlab("St. Dev")+
theme(text=element_text(family="CMU")),
ggplot(sleep,aes(x=SE_SD))+
geom_histogram(bins=40) + ggtitle("Sleep Efficiency") + xlab("St. Dev")+
theme(text=element_text(family="CMU")),
ggplot(sleep,aes(x=WASO_SD))+
geom_histogram(bins=40) + ggtitle("Wake After Sleep Onset") +
xlab("St. Dev")+ theme(text=element_text(family="CMU")),nrow=2)
Comments:
we can observe a degree of variability in each variable, with most cases ranging between 50 and 70 minutes for sleep and wake time, time in bed and total sleep time, and between 10 and 20 minutes for sleep onset latency and wake after sleep onset.
the most consistent variable is sleep efficiency, which shows an average SD around 3%.
Then, we can take a look at the day-by-day variability within each participant. Below, each variable is plotted day-by-day for each participant and considering the whole sample.
sleep.long$day <- as.numeric(sleep.long$day)
# TIB AND TST by participant
ggplot(sleep.long,aes(x=day,y=TIB)) +
geom_smooth(se=FALSE,span=0.7,colour="black") +
geom_point(size=1,colour="black") +
geom_smooth(aes(y=TST),colour="#99c2ff",se=FALSE,span=0.7) + # insoddisfatto
geom_point(aes(y=TST),size=1,colour="#99c2ff") +
geom_vline(aes(xintercept=5.5))+
facet_wrap("ID",strip.position="right") +
ggtitle("Time in bed (black) and Total Sleep Time (blue)")+ylab("TST (min)")+
theme(axis.text = element_text(size=5),text=element_text(family="CMU"))+
scale_x_continuous(breaks = c(2,4,6),labels=c("TUE","THU","SAT"))
# TIB AND TST distributions
ggplot(sleep.long,aes(x=as.factor(as.character(day)),y=TST,fill=as.factor(as.character(day))))+
geom_flat_violin(position = position_nudge(x = .2, y = 0),adjust =2,size=1.5)+
geom_flat_violin(aes(y=TIB),fill="gray",position = position_nudge(x = .2, y = 0),adjust =2,size=1.5,alpha=.5)+
geom_point(aes(col=as.factor(as.character(day))),position = position_jitter(width = .15),size=2)+
geom_boxplot(aes(x = as.factor(as.character(day)), y = TST, fill = as.factor(as.character(day))),size=1.5,
outlier.shape = NA, alpha = .5, width = .1, colour = "black")+
labs(x="",y="TST (min)")+ggtitle("Total Sleep Time and Time in bed (in gray)")+
scale_x_discrete(labels=c("MON","TUE","WED","THU","FRY","SAT","SUN"))+
theme(text = element_text(family = "CMU",size=18,face="bold"),legend.position = "none")
# SE by participants
ggplot(sleep.long,aes(x=day,y=SE)) +
geom_smooth(se=FALSE,span=0.7,colour="black") +
geom_point(size=1,colour="black") +
geom_vline(aes(xintercept=5.5))+
facet_wrap("ID",strip.position="right") +
ggtitle("Sleep Efficiency")+ylab("SE (%)")+
theme(axis.text = element_text(size=5),text=element_text(family="CMU"))+
scale_x_continuous(breaks = c(2,4,6),labels=c("TUE","THU","SAT"))
# SE distributions
ggplot(sleep.long,aes(x=as.factor(as.character(day)),y=SE,fill=as.factor(as.character(day))))+
geom_flat_violin(position = position_nudge(x = .2, y = 0),adjust =2,size=1.5)+
geom_point(aes(col=as.factor(as.character(day))),position = position_jitter(width = .15),size=2)+
geom_boxplot(aes(x = as.factor(as.character(day)), y = SE, fill = as.factor(as.character(day))),size=1.5,
outlier.shape = NA, alpha = .5, width = .1, colour = "black")+
labs(x="",y="SE (%)")+ggtitle("Sleep Efficiency")+
scale_x_discrete(labels=c("MON","TUE","WED","THU","FRY","SAT","SUN"))+
theme(text = element_text(family = "CMU",size=18,face="bold"),legend.position = "none")
# WASO and SOL by participant
ggplot(sleep.long,aes(x=day,y=WASO)) +
geom_smooth(se=FALSE,span=0.7,colour="darkred") +
geom_point(size=1,colour="darkred") +
geom_smooth(aes(y=SOL),se=FALSE,span=0.7,colour="salmon") +
geom_point(aes(y=SOL),size=1,colour="salmon") +
geom_vline(aes(xintercept=5.5))+
facet_wrap("ID",strip.position="right") +
ggtitle("Wake After Sleep Onset (dark red) and Sleep Onset Latency (light red)")+ylab("WASO and SOL (min)")+
theme(axis.text = element_text(size=5),text=element_text(family="CMU"))+
scale_x_continuous(breaks = c(2,4,6),labels=c("TUE","THU","SAT"))
# WASO distributions
ggplot(sleep.long,aes(x=as.factor(as.character(day)),y=WASO,fill=as.factor(as.character(day))))+
geom_flat_violin(position = position_nudge(x = .2, y = 0),adjust =2,size=1.5)+
geom_point(aes(col=as.factor(as.character(day))),position = position_jitter(width = .15),size=2)+
geom_boxplot(aes(x = as.factor(as.character(day)), y = WASO, fill = as.factor(as.character(day))),size=1.5,
outlier.shape = NA, alpha = .5, width = .1, colour = "black")+
labs(x="",y="WASO (min)")+ggtitle("Wake After Sleep Onset")+
scale_x_discrete(labels=c("MON","TUE","WED","THU","FRY","SAT","SUN"))+
theme(text = element_text(family = "CMU",size=18,face="bold"),legend.position = "none")
# SOL distributions
ggplot(sleep.long[!is.na(sleep.long$SOL)&sleep.long$SOL<100,],aes(x=as.factor(as.character(day)),y=SOL,fill=as.factor(as.character(day))))+
geom_flat_violin(position = position_nudge(x = .2, y = 0),adjust =2,size=1.5)+
geom_point(aes(col=as.factor(as.character(day))),position = position_jitter(width = .15),size=2)+
geom_boxplot(aes(x = as.factor(as.character(day)), y = SOL, fill = as.factor(as.character(day))),size=1.5,
outlier.shape = NA, alpha = .5, width = .1, colour = "black")+
labs(x="",y="SOL (min)")+ggtitle("Sleep Onset Latency")+
scale_x_discrete(labels=c("MON","TUE","WED","THU","FRY","SAT","SUN"))+
theme(text = element_text(family = "CMU",size=18,face="bold"),legend.position = "none")
# WT distributions
ggplot(sleep.long,aes(x=as.factor(as.character(day)),y=WT,fill=as.factor(as.character(day))))+
geom_flat_violin(position = position_nudge(x = .2, y = 0),adjust =2,size=1.5)+
geom_point(aes(col=as.factor(as.character(day))),position = position_jitter(width = .15),size=2)+
geom_boxplot(aes(x = as.factor(as.character(day)), y = WT, fill = as.factor(as.character(day))),size=1.5,
outlier.shape = NA, alpha = .5, width = .1, colour = "black")+
labs(x="",y="")+ggtitle("Wake time")+
scale_x_discrete(labels=c("MON","TUE","WED","THU","FRY","SAT","SUN"))+
scale_y_continuous(breaks=c(4,8,12,16),labels=c("04:00","08:00","12:00","16:00"))+
theme(text = element_text(family = "CMU",size=18,face="bold"),legend.position = "none")
# BT distributions
ggplot(sleep.long,aes(x=as.factor(as.character(day)),y=BT-24,fill=as.factor(as.character(day))))+
geom_flat_violin(position = position_nudge(x = .2, y = 0),adjust =2,size=1.5)+
geom_point(aes(col=as.factor(as.character(day))),position = position_jitter(width = .15),size=2)+
geom_boxplot(aes(x = as.factor(as.character(day)), y = BT-24, fill = as.factor(as.character(day))),size=1.5,
outlier.shape = NA, alpha = .5, width = .1, colour = "black")+
labs(x="",y="")+ggtitle("Bed time")+
scale_x_discrete(labels=c("MON","TUE","WED","THU","FRY","SAT","SUN"))+
scale_y_continuous(breaks=c(0,5,10),labels=c("00:00","05:00","10:00"))+
theme(text = element_text(family = "CMU",size=18,face="bold"),legend.position = "none")
# mean values (WT.dayAfter is wake time on the day after)
sleep.long$day <- factor(sleep.long$day,levels=c(7,6,5,4,3,2,1))
means <- summarySE(sleep.long,measurevar="WT.dayAfter",groupvars="day",na.rm=TRUE)
means$BT <- summarySE(sleep.long,measurevar="BT",groupvars="day",na.rm=TRUE)[,3]
means$BT.sd <- summarySE(sleep.long,measurevar="BT",groupvars="day",na.rm=TRUE)[,4]
means$BT.se <- summarySE(sleep.long,measurevar="BT",groupvars="day",na.rm=TRUE)[,5]
means <- means[order(means$day,decreasing=TRUE),]
# plotting BT and following WT
ggplot(sleep.long,aes(x=day,y=BT-24))+
geom_flat_violin(adjust =2,size=0,alpha=.5,fill="gray")+
geom_flat_violin(aes(y=WT.dayAfter),
adjust =2,size=0,alpha=.5,fill="gray")+
geom_segment(data=means,aes(x=day,xend=day,
y=WT.dayAfter,yend=BT-24),size=2,colour="black")+
geom_point(data=means,aes(x=day,y=WT.dayAfter),size=6,shape=21,fill="gray")+
geom_point(data=means,aes(x=day,y=BT-24),size=6,shape=22,fill="gray")+
labs(x="",y="")+
scale_x_discrete(labels=c("SUN","SAT","FRY","THU","WED","TUE","MON"))+
scale_y_continuous(breaks=c(0,2.5,5,7.5,10,12.5),limits=c(-1,11),
labels=c("00:00","02:30","05:00","07:30","10:00","12:30"))+
theme_classic()+
theme(text = element_text(family = "time",face="bold",size=18),legend.position = "top")+
coord_flip()
Comments:
We observe no clear differences between weekdays and weekend for TIB, TST, SE, WASO, and SOL.
In contrast, slightly higher WT and BT values are showed on Saturday compared to other days, suggesting that participants fall asleep later and wake up later on Saturday.
We can notice one outlier (EM39) for WT on day 4 (Thursday).
The latter graph suggest that both Wednesday and Saturday show slightly higher variability compared to other days.
In this section, we assess the goodness of fit (GOF) of each model specified in the “LMER MODELS” section. Specifically, we focus on the most complex model speciefied per each sleep measure, and we evaluate (A) the normality of resuduals distribution, (B) the normality of random effects distribution, and (C) the independence of residuals from fitted values (homoscedasticity).
library(lme4); library(jtools); library(effects) # modeling
library(fitdistrplus) ; library(car) ; library(lattice) # gof
For each sleep metric, we assess the GOF of the most complex model (including all predictors of interest) and we assess the following LMER models’ assumptions:
independence between residuals and fitted values,
the normality of both residuals’ and random effects’ distribution,
that random effects are centered on zero,
Typically, TST shows a symmetrical shape that can be modeled with a normal distribution (as shown in the GRAPHICS section).
m <- lmer(TIB~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)
# residuals normal fit (good)
plot(fitdist(resid(m),distr="norm",method="mle"))
# random effects normal fit (good)
plot(fitdist(ranef(m)$'ID'[[1]],distr="norm",method="mle"))
# LMER assumptions (good)
gridExtra::grid.arrange(plot(m,main="Resid vs Fitted"),
qqmath(residuals(m),main="Residual QQPlot"),
dotplot(ranef(m))[[1]],
plot(ranef(m),main="Random Effects QQplot")[[1]],
nrow=2)
Comments:
The normal distribution shows a good fit on TIB’s residuals and random effects.
All other LMER assumptions seem to be fulfilled.
Conclusions:
We can model TIB under the normality assumption.
Like TIB, TST typically shows a symmetrical shape that can be modeled with a normal distribution (as shown in the GRAPHICS section).
m <- lmer(TST~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)
# residuals normal fit (good)
plot(fitdist(resid(m),distr="norm",method="mle"))
# random effects normal fit (good)
plot(fitdist(ranef(m)$'ID'[[1]],distr="norm",method="mle"))
# LMER assumptions (good)
gridExtra::grid.arrange(plot(m,main="Resid vs Fitted"),
qqmath(residuals(m),main="Residual QQPlot"),
dotplot(ranef(m))[[1]],
plot(ranef(m),main="Random Effects QQplot")[[1]],
nrow=2)
Comments:
The normal distribution shows an excellent fit on TST’s residuals and a good fit on random effects.
All other LMER assumptions seem to be fulfilled.
Conclusions:
We can model TST under the normality assumption.
SE is bounded between 0 (when TST = 0) and 100 (when TST = TIB). Thus, a beta regression could be the best option, but at the time of these analyses no packages able to deal with mixed-effects beta regression were available (to our knowledge).
In our sample, only 4 on 82 showed an SE lower than 70%: EM07 (night 5), EM22 (night 36), EM36 (night 7) and EM53 (all 7 nights), whereas only one participant showed an SE = 99% and no participants showed a 100% SE.
# SE < 70%
sleep.long[!is.na(sleep.long$SE)&sleep.long$SE<70,
c("ID","day","SE","TIB","TST","WASO","SOL")]
mean(sleep.long[sleep.long$ID=="MZ53","SE"])
## [1] 61.77
mean(sleep.long[sleep.long$ID=="MZ53","WASO"])
## [1] 148.2857
# SE = 99%
sleep.long[!is.na(sleep.long$SE)&sleep.long$SE==99,
c("ID","day","SE","TIB","TST","WASO","SOL")]
The frequency distribution shows a quite normal shape, which is symmetric and centered approximately on 87-88%. The issue of modeling these data using a normal distribution is that the models would predict values above 100%, whereas TST cannot be higher than TIB (thus, SE cannot be higher than 100%).
hist(sleep.long$SE,breaks=100)
With no clues on how to model the data differently, we accept this limitation and we proceed by evaluating the fit of the normal distribution. Indeed, SE can be seen as a sort of transformation of TST, which can be reasonably modeled assuming residuals normality. SE results should confirm those obtained on TST, and we could focus our goal on assessing the impact of our predictors of interest on SE, instead of predicting meaningful SE values.
m <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)
# residuals normal fit (acceptable)
plot(fitdist(resid(m),distr="norm",method="mle"))
# random effects normal fit (acceptable)
plot(fitdist(ranef(m)$'ID'[[1]],distr="norm",method="mle"))
# LMER assumptions (acceptable)
gridExtra::grid.arrange(plot(m,main="Resid vs Fitted"),
qqmath(residuals(m),main="Residual QQPlot"),
dotplot(ranef(m))[[1]],
plot(ranef(m),main="Random Effects QQplot")[[1]],
nrow=2)
Comments:
The normal distribution does not show a very good fit on SE’s residuals or random effects. This is especially due to the four outliers (EM07, EM22, EM36 and MZ53) showing SE lower than 70% (2+ hours of WASO, 2 to 5 hours of TST).
All other LMER assumptions seem to be fulfilled (residuals are independent of fitted values, and random effects are normally distributed and centered on zero).
Let’s see what happens to residuals normality if we exclude the two outliers:
m <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),
sleep.long[sleep.long$ID!="EM07"&sleep.long$ID!="EM22"&sleep.long$ID!="EM36"&sleep.long$ID!="MZ53",]) # excluding outliers
# residuals normal fit (acceptable)
plot(fitdist(resid(m),distr="norm",method="mle"))
Comments:
We can also try to log-transform the data considering the whole sample.
m <- lmer(log(SE)~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),data=sleep.long)
# residuals normal fit (worse)
plot(fitdist(resid(m),distr="norm",method="mle"))
Comments:
Conclusions:
Predicting SE scores assuming a normal residuals distribution implies that SE could be above 100% or (with much less probability) below 0%, which cannot be the case. However, with no clues on alternative ways to model our data, the normal distribution seems to be the better way. Looking at the data, we evaluate the normal fit on SE residuals as not excellent but at least acceptable. Removing four outliers would improve the fit.
SOL is a variable bounded on zero, typically skewed on the right, with few cases showing more than 1 hour of latency. Thus, we can try to log transform it to fit a model under normality, or we might use the Gamma family, which is bounded on zero.
# normal fit on log-transformed SOL (a constant is added to avoid cases of log(0))
m <- lmer(log(SOL+.05)~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)
# residuals normal fit (acceptable?)
plot(fitdist(resid(m),distr="norm",method="mle"))
# random effects normal fit (good)
plot(fitdist(ranef(m)$'ID'[[1]],distr="norm",method="mle"))
# LMER assumptions (acceptable)
gridExtra::grid.arrange(plot(m,main="Res. vs Fitted"),
qqmath(residuals(m),main="Residual QQPlot"),
dotplot(ranef(m))[[1]],
plot(ranef(m),main="Random Effects QQplot")[[1]],
nrow=2)
Comments:
the fit of the normal distribution is not very good but acceptable
all other LMER assumptions seem to be fulfilled (residuals are independent of fitted values, and random effects are normally distributed and centered on zero)
Then, we check the fit of the Gamma family.
m <- glmer(SOL+.05~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long,
family="Gamma")
plot(fitdist(resid(m),distr="norm",method="mle"))
Conclusions:
The fit of the Gamma family on raw SOL data is worse than that of the normal distribution on log-transformed data. The best option seems to log-transform.
As TST, WASO usually shows a symmetrical shape that can be modeled with a normal distribution (as shown in the GRAPHICS section).
m <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)
# residuals normal fit (good)
plot(fitdist(resid(m),distr="norm",method="mle"))
# random effects normal fit (good)
plot(fitdist(ranef(m)$'ID'[[1]],distr="norm",method="mle"))
# LMER assumptions (good)
gridExtra::grid.arrange(plot(m,main="Resid vs Fitted"),
qqmath(residuals(m),main="Residual QQPlot"),
dotplot(ranef(m))[[1]],
plot(ranef(m),main="Random Effects QQplot")[[1]],
nrow=2)
Comments:
the normal distribution shows a good fit on WASO’s residuals and a good fit on random effects
all other LMER assumptions seem to be fulfilled
Conclusions:
We can model WASO under the normality assumption.
We expect to observe a symmetric shape also for BT.
m <- lmer(BT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)
# residuals normal fit (acceptable)
plot(fitdist(resid(m),distr="norm",method="mle"))
# random effects normal fit (good)
plot(fitdist(ranef(m)$'ID'[[1]],distr="norm",method="mle"))
# LMER assumptions (good)
gridExtra::grid.arrange(plot(m,main="Resid vs Fitted"),
qqmath(residuals(m),main="Residual QQPlot"),
dotplot(ranef(m))[[1]],
plot(ranef(m),main="Random Effects QQplot")[[1]],
nrow=2)
# outliers
sleep.long[!is.na(sleep.long$BT)&sleep.long$BT>30,c("ID","day","BT","WT")]
Comments:
the fit of the normal distribution is not very good but acceptable
we can observe some outliers (especially EM22 and MZ36) showed BT higher than 34
all other LMER assumptions seem to be fulfilled (residuals are independent of fitted values, and random effects are normally distributed and centered on zero)
Let’s see what happens if we exclude the two outliers:
m <- lmer(BT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),
sleep.long[sleep.long$ID!="EM22"&sleep.long$ID!="MZ36",]) # excluding outliers
# residuals normal fit (slighlty better)
plot(fitdist(resid(m),distr="norm",method="mle"))
Comments:
Also a log transformation slighlty improves the normal fit, but the improvement is not substantial.
m <- lmer(log(BT)~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)
# residuals normal fit (slighlty better)
plot(fitdist(resid(m),distr="norm",method="mle"))
Conclusions:
The normal fit on BT is not excellent but at least acceptable, removing two outliers seems to improve the fit. There is no evident reason to log transform the data.
We expect to observe a symmetric shape also for WT.
m <- lmer(WT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)
# residuals normal fit (good)
plot(fitdist(resid(m),distr="norm",method="mle"))
# random effects normal fit (good)
plot(fitdist(ranef(m)$'ID'[[1]],distr="norm",method="mle"))
# LMER assumptions (good)
gridExtra::grid.arrange(plot(m,main="Resid vs Fitted"),
qqmath(residuals(m),main="Residual QQPlot"),
dotplot(ranef(m))[[1]],
plot(ranef(m),main="Random Effects QQplot")[[1]],
nrow=2)
Comments:
the normal distribution fits well both residuals and random effects for WT
all other LMER assumptions seem to be fulfilled (residuals are independent of fitted values, and random effects are centered on zero)
Conclusions:
We can model WT under the normality assumption.
Influential cases are cases that strongly determine the size of the parameters estimated by a model, negatively influencing the statistical fit and generalizability of the model. Influential analysis is based on the principle that the iterative exclusion of single cases from the data considered by a target model should not produce substantially different parameters estimates (Nieuwenhuis, Te Grotenhuis & Pelzer, 2012).
In this section, we perform an influential analysis by using the Cook’s distance (CD), which indicates, for each observation i, the distance between the parameters vector of the model M and that of M-i. High CD values indicate that the observation has a strong influence on the parameters estimated by the model. A cut-off of 4/N is used to identify the most influential cases based on the CD.
For each sleep measure, we use the most complex model (with all predictors of interest) to proceed as follows:
we compute the Cook’s distance by considering a cut-off set to 4/N,
we re-estimate the parameters by removing all potentially influent cases identified with the CD, one-by-one,
if a given case shows to substantially influence the parameters estimation, we replicate the influential analysis without that case,
finally, we decide which cases can be removed.
# taking only ID and variables of interest (only continuous), specifying the model
data2check <- sleep.long[!is.na(sleep.long$TIB)&!is.na(sleep.long$week.time)&!is.na(sleep.long$SEX)&
!is.na(sleep.long$MEQ.r)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
!is.na(sleep.long$ALCOL_week)&!is.na(sleep.long$FUMO..SI.NO.)&
!is.na(sleep.long$CAFFE_week),
c("ID","day","TIB","week.time","SEX","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.","CAFFE_week")]
m <- lmer(TIB~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),data2check)
# computing Cook's dinstance and cutoff
cd <- cooks.distance(m)
data2check$cd <- cd
cutoff <- 4/length(levels(data2check$ID)) # 4/N
# selecting influential cases
index <- c(1:nrow(data2check)) # nrow
pm=cbind(data2check,cd,index)
pm=pm[order(pm$cd,decreasing = TRUE),] # order
pm$infl <- NA
for(i in 1:nrow(pm)){
if(pm[i,"cd"] >= cutoff){ pm[i,"infl"] <- 1 } else { pm[i,"infl"] <- 0 }
}
PM <- pm
# Plotting Cook's distance
par(mfrow=c(1,1),mar=c(6, 5, 4, 2) + 0.1)
plot(cd, xlab="Index",ylab="Cook's distance",
main=paste("Cook's Distance \ncut-off: 4/N =",round(cutoff,3),
" ( NInfl =",nrow(pm[pm$infl==1,]),")"),
pch=20,cex.lab=1.6,cex.axis=1.4,col.main="blue")
points(pm[pm$cd>=cutoff,"index"],
pm[pm$cd>=cutoff,"cd"],pch=20,cex=1.3,col="red")
text(pm[pm$cd>=cutoff,"index"]-10,
pm[pm$cd>=cutoff,"cd"]-0.005,
paste(pm[pm$cd>=cutoff,"ID"],pm[pm$cd>=cutoff,"day"]),cex=.8)
abline(h=cutoff,
lwd=1.8,col="blue",lty=2)
# participants showing cases above the cutoff
levels(as.factor(as.character(pm[pm$cd>cutoff,"ID"])))
## [1] "EM07" "EM09" "EM22" "EM42" "em51" "MZ04" "MZ22" "MZ25" "MZ34" "MZ39"
## [11] "MZ40" "MZ46" "MZ55" "MZ56"
Comments:
21 observations (i.e., combination of participant and day) due to 14 participants are more extreme than others and show a CD above the cutoff.
observations from MZ56 shows the highest Cook’s distance
In the next step, we re-estimate the parameters by excluding each potentially influential participant, one-by-one. We focus on substantial changes in the estimated parameters, such that the excluded cases implies a drastic change in the interpretation of the effects.
# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(PM[PM$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
new.data <- PM[PM$ID!=ID,]
m4 <- lmer(TIB~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
new.data <- as.data.frame(round(summary(m4)$coefficients,3))
new.data$ID <- ID
parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
# week.time (the code for the remaining parameters is not showed)
p1 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,est))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")
p2 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,t))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")
gridExtra::grid.arrange(p1,p2,nrow=2)
Comments:
Larger differences are shown by the parameters estimated for week.time (ranging from .20 to .28, with t-values ranging from 1.7 to 2.3) and Gender (ranging from -.39 to -.32, with t-values ranging from -.2.2 to -1.8).
The most influential participants are EM42 (whose exclusion leads to a decrease of .04 in the parameter estimated for week.time, whose t-value decreases from 2.0 to 1.7), and MZ56 (whose exclusion leads to a decrease of .03 in the parameter estimated for Gender, whose t-value increases in absolute value from 1.95 to 2.2)
None of the the other parameters seems to substantially change the interpretation of the results (i.e., non-significant effects remain non-significant, positive effects remain positive and negative effects remain negative).
Thus, we replicate the influential analysis by removing EM42 and MZ56 We start with the latter, which shows the highest Cook’s distance.
OUT <- data2check[data2check$ID!="MZ56",] # excluding influential case
m <- lmer(TIB~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),OUT)
# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(OUT[OUT$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
new.data <- OUT[OUT$ID!=ID,]
m4 <- lmer(TIB~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
new.data <- as.data.frame(round(summary(m4)$coefficients,3))
new.data$ID <- ID
parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
**Comments:*
the exclusion of MZ56 increases the effect of Gender, which becomes more consistent across the sample.
the role of week.time is still inconsistent, with participant EM42 showing the strongest influence (leading to an increase of .04 in the estimated parameter, with t-value increasing from 1.5 to 1.9).
none of the other parameters shows to be substantially influenced by the exclusion of MZ56.
Thus, we replicate the influential analysis by removing EM42
OUT <- OUT[OUT$ID!="EM42",] # excluding influential case
m <- lmer(TIB~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),OUT)
# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(OUT[OUT$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
new.data <- OUT[OUT$ID!=ID,]
m4 <- lmer(TIB~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
new.data <- as.data.frame(round(summary(m4)$coefficients,3))
new.data$ID <- ID
parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
Comments:
the exclusion of EM42 decreases the effect of Week time, which becomes more consistent across the sample.
none of the other parameters shows to be substantially influenced by the exclusion of EM42 (i.e., non-significant effects remain non-significant, positive effects remain positive and negative effects remain negative).
Conclusions:
We decide to perform the analyses on TIB without MZ56 and EM42.
# taking only ID and variables of interest (only continuous), specifying the model
data2check <- sleep.long[!is.na(sleep.long$TST)&!is.na(sleep.long$week.time)&!is.na(sleep.long$SEX)&
!is.na(sleep.long$MEQ.r)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
!is.na(sleep.long$ALCOL_week)&!is.na(sleep.long$FUMO..SI.NO.)&
!is.na(sleep.long$CAFFE_week),
c("ID","day","TST","week.time","SEX","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.","CAFFE_week")]
m <- lmer(TST~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),data2check)
# computing Cook's dinstance and cutoff
cd <- cooks.distance(m)
data2check$cd <- cd
cutoff <- 4/length(levels(data2check$ID)) # 4/N
# selecting influential cases
index <- c(1:nrow(data2check)) # nrow
pm=cbind(data2check,cd,index)
pm=pm[order(pm$cd,decreasing = TRUE),] # order
pm$infl <- NA
for(i in 1:nrow(pm)){
if(pm[i,"cd"] >= cutoff){ pm[i,"infl"] <- 1 } else { pm[i,"infl"] <- 0 }
}
PM <- pm
# Plotting Cook's distance
par(mfrow=c(1,1),mar=c(6, 5, 4, 2) + 0.1)
plot(cd, xlab="Index",ylab="Cook's distance",
main=paste("Cook's Distance \ncut-off: 4/N =",round(cutoff,3),
" ( NInfl =",nrow(pm[pm$infl==1,]),")"),
pch=20,cex.lab=1.6,cex.axis=1.4,col.main="blue")
points(pm[pm$cd>=cutoff,"index"],
pm[pm$cd>=cutoff,"cd"],pch=20,cex=1.3,col="red")
text(pm[pm$cd>=cutoff,"index"]-10,
pm[pm$cd>=cutoff,"cd"]-0.005,
paste(pm[pm$cd>=cutoff,"ID"],pm[pm$cd>=cutoff,"day"]),cex=.8)
abline(h=cutoff,
lwd=1.8,col="blue",lty=2)
# participants showing cases above the cutoff
levels(as.factor(as.character(pm[pm$cd>cutoff,"ID"])))
## [1] "EM06" "EM07" "EM09" "EM22" "EM23" "EM36" "EM42" "em51" "EM53" "MZ04"
## [11] "MZ08" "MZ22" "MZ25" "MZ34" "MZ39" "MZ40" "MZ44" "MZ46" "MZ50" "MZ55"
## [21] "MZ56"
Comments:
In the next step, we re-estimate the parameters by excluding each potentially influential participant, one-by-one. We focus on substantial changes in the estimated parameters, such that the excluded cases implies a drastic change in the interpretation of the effects.
# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(PM[PM$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
new.data <- PM[PM$ID!=ID,]
m4 <- lmer(TST~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
new.data <- as.data.frame(round(summary(m4)$coefficients,3))
new.data$ID <- ID
parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
# week.time (the code for the remaining parameters is not showed)
p1 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,est))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")
p2 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,t))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")
gridExtra::grid.arrange(p1,p2,nrow=2)
Comments:
Larger differences are shown by the parameters estimated for week.time (ranging from 12 to 16, with t-values ranging from 2.0 to 2.4), MEQ-R (ranging from -1.25 to 0.00, with t-values ranging from -1.00 to 0.00, and MZ44 and MZ56 leading to an average decrease of .4 in the estimated parameter), ALCOHOL (ranging from -1.6 to -.2, with t-values ranging from -1.0 to 0.0, and MZ44 leading to an absolute increase of .75 in the estimated parameter), SMOKE (ranging from -15 to -8, with t-values ranging from -1.4 to -.8, and EM06, EM22 and EM36 leading to an average absolute increase of .2 in the estimated parameter).
EM22 and EM36 are among the most influential cases. Interestingly, they are also two of the four outliers found for SE.
None of the potentially influential cases seems to substantially change the interpretation of the results (i.e., non-significant effects remain non-significant, positive effects remain positive and negative effects remain negative).
Conclusions:
We decide to keep all participants in TST analyses.
# taking only ID and variables of interest (only continuous), specifying the model
data2check <- sleep.long[!is.na(sleep.long$SE)&!is.na(sleep.long$week.time)&!is.na(sleep.long$SEX)&
!is.na(sleep.long$MEQ.r)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
!is.na(sleep.long$ALCOL_week)&!is.na(sleep.long$FUMO..SI.NO.)&
!is.na(sleep.long$CAFFE_week),
c("ID","day","SE","week.time","SEX","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.","CAFFE_week")]
m <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),data2check)
# computing Cook's dinstance and cutoff
cd <- cooks.distance(m)
data2check$cd <- cd
cutoff <- 4/length(levels(data2check$ID)) # 4/N
# selecting influential cases
index <- c(1:nrow(data2check)) # nrow
pm=cbind(data2check,cd,index)
pm=pm[order(pm$cd,decreasing = TRUE),] # order
pm$infl <- NA
for(i in 1:nrow(pm)){
if(pm[i,"cd"] >= cutoff){ pm[i,"infl"] <- 1 } else { pm[i,"infl"] <- 0 }
}
PM <- pm
# Plotting mahalanobis distance
par(mfrow=c(1,1),mar=c(6, 5, 4, 2) + 0.1)
plot(cd, xlab="Index",ylab="Cook's distance",
main=paste("Cook's Distance \ncut-off: 4/N =",round(cutoff,3),
" ( NInfl =",nrow(pm[pm$infl==1,]),")"),
pch=20,cex.lab=1.6,cex.axis=1.4,col.main="blue")
points(pm[pm$cd>=cutoff,"index"],
pm[pm$cd>=cutoff,"cd"],pch=20,cex=1.3,col="red")
text(pm[pm$cd>=cutoff,"index"]-10,
pm[pm$cd>=cutoff,"cd"]-0.005,
paste(pm[pm$cd>=cutoff,"ID"],pm[pm$cd>=cutoff,"day"]),cex=.8)
abline(h=cutoff,
lwd=1.8,col="blue",lty=2)
# participants showing cases above the cutoff
levels(as.factor(as.character(pm[pm$cd>cutoff,"ID"])))
## [1] "EM02" "EM04" "EM07" "EM08" "EM10" "EM12" "EM16" "EM22" "EM23" "EM29"
## [11] "EM36" "EM39" "EM54" "MZ03" "MZ04" "MZ05" "MZ09" "MZ15" "MZ21" "MZ25"
## [21] "MZ30" "MZ35" "MZ36" "MZ38" "MZ53" "MZ56"
Comments:
In the next step, we re-estimate the parameters by excluding each potentially influential participant, one-by-one. We focus on substantial changes in the estimated parameters, such that the excluded cases implies a drastic change in the interpretation of the effects.
# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(data2check[data2check$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
new.data <- data2check[data2check$ID!=ID,]
m4 <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
new.data <- as.data.frame(round(summary(m4)$coefficients,3))
new.data$ID <- ID
parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
# week.time (the code for the remaining parameters is not showed)
p1 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,est))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="week.timewe" & (parameters$est<=.1|parameters$est>=.3),],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,t))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="week.timewe" & (parameters$est<=.1|parameters$est>=.3),],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)
Comments:
Larger differences are shown by the parameter estimated for NAP (ranging from -.6 to -1.1, with t-values ranging from -1.6 to -2.3, and EM07 showing the strongest influence, leading to an increase of .4 in the estimated NAP coefficient). Further differences are shown by the parameters estimated for sex (ranging from -2.0 to -1.0, with t-values ranging from -1.6 to -1.0, and MZ53 leading to an absolute increase of .75 in the estimated parameter), and smoke (ranging from -3 to -1.5, with t-values ranging from -2.5 to -1.5, and MZ53 leading to an absolute increase of 1.0 in the estimated parameter).
MZ53 shows a degree of influence on almost all parameters, although the interpretation of the results does not change substantially (i.e., non-significant effects remain non-significant, positive effects remain positive and negative effects remain negative). The only exception is smoke, whose effect seems to strongly depend on MZ53.
MZ53 is also one of the four outliers found for SE, with all nights showing an SE < .70.
Thus, we replicate the influential analysis by removing EM07 and MZ53. We start with the former.
OUT <- data2check[data2check$ID!="EM07",] # excluding influential case
m <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),OUT)
# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(OUT[OUT$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
new.data <- OUT[OUT$ID!=ID,]
m4 <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
new.data <- as.data.frame(round(summary(m4)$coefficients,3))
new.data$ID <- ID
parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
Comments:
The exclusion of EM07 decreases the effect of NAP and makes the estimated parameter more consistent.
Participant MZ53 continues to show a degree of influence on many parameters, and particularly on smoke.
Thus, we replicate the influential analysis by removing MZ53.
OUT <- OUT[OUT$ID!="MZ53",] # excluding influential case
m <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),OUT)
# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(OUT[OUT$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
new.data <- OUT[OUT$ID!=ID,]
m4 <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
new.data <- as.data.frame(round(summary(m4)$coefficients,3))
new.data$ID <- ID
parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
Comments:
The exclusion of MZ53 decreases the effect of smoke and makes the estimated parameter more consistent. None of the other parameters show to be substantially affected by the removal of this participant.
We do not observe other potentially influential cases.
Conclusions:
We decide to perform the analyses on SE without EM07 and MZ53.
# taking only ID and variables of interest (only continuous), specifying the model
data2check <- sleep.long[!is.na(sleep.long$SOL)&!is.na(sleep.long$week.time)&!is.na(sleep.long$SEX)&
!is.na(sleep.long$MEQ.r)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
!is.na(sleep.long$ALCOL_week)&!is.na(sleep.long$FUMO..SI.NO.)&
!is.na(sleep.long$CAFFE_week),
c("ID","day","SOL","week.time","SEX","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.","CAFFE_week")]
m <- lmer(log(SOL+.05)~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)
# computing Cook's dinstance and cutoff
cd <- cooks.distance(m)
data2check$cd <- cd
cutoff <- 4/length(levels(data2check$ID)) # 4/N
# selecting influential cases
index <- c(1:nrow(data2check)) # nrow
pm=cbind(data2check,cd,index)
pm=pm[order(pm$cd,decreasing = TRUE),] # order
pm$infl <- NA
for(i in 1:nrow(pm)){
if(pm[i,"cd"] >= cutoff){ pm[i,"infl"] <- 1 } else { pm[i,"infl"] <- 0 }
}
PM <- pm
# Plotting mahalanobis distance
par(mfrow=c(1,1),mar=c(6, 5, 4, 2) + 0.1)
plot(cd, xlab="Index",ylab="Cook's distance",
main=paste("Cook's Distance \ncut-off: 4/N =",round(cutoff,3),
" ( NInfl =",nrow(pm[pm$infl==1,]),")"),
pch=20,cex.lab=1.6,cex.axis=1.4,col.main="blue")
points(pm[pm$cd>=cutoff,"index"],
pm[pm$cd>=cutoff,"cd"],pch=20,cex=1.3,col="red")
text(pm[pm$cd>=cutoff,"index"]-10,
pm[pm$cd>=cutoff,"cd"]-0.005,
paste(pm[pm$cd>=cutoff,"ID"],pm[pm$cd>=cutoff,"day"]),cex=.8)
abline(h=cutoff,
lwd=1.8,col="blue",lty=2)
# participants showing cases above the cutoff
levels(as.factor(as.character(pm[pm$cd>cutoff,"ID"])))
## [1] "EM04" "EM05" "EM08" "EM09" "EM12" "EM18" "EM23" "EM24" "EM26" "EM44"
## [11] "EM53" "EM54" "MZ02" "MZ04" "MZ05" "MZ14" "MZ15" "MZ21" "MZ22" "MZ23"
## [21] "MZ31" "MZ35" "MZ36" "MZ39" "MZ47" "MZ49" "MZ50" "MZ51" "MZ55" "MZ57"
Comments:
In the next step, we re-estimate the parameters by excluding each potentially influential participant, one-by-one. We focus on substantial changes in the estimated parameters, such that the excluded cases implies a drastic change in the interpretation of the effects.
# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(data2check[data2check$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
new.data <- data2check[data2check$ID!=ID,]
m4 <- lmer(log(SOL+.05)~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
new.data <- as.data.frame(round(summary(m4)$coefficients,3))
new.data$ID <- ID
parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
# week.time (the code for the remaining parameters is not showed)
p1 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,est))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="week.timewe" & parameters$est<=(-.1),],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,t))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="week.timewe" & parameters$est<=(-.1),],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)
Comments:
None of the estimated parameters seems to be substantially affected by potentially influential cases.
Two participants show stronger influence on several parameters: MZ15 for sex, MEQ-r, PSQI, and BDI; MZ23 for alcohol and coffee. However, also these participants do not change the interpretation of the results (i.e., non-significant effects remain non-significant, positive effects remain positive and negative effects remain negative).
Conclusions:
We decide to keep all participants in SOL analyses.
# taking only ID and variables of interest (only continuous), specifying the model
data2check <- sleep.long[!is.na(sleep.long$WASO)&!is.na(sleep.long$week.time)&!is.na(sleep.long$SEX)&
!is.na(sleep.long$MEQ.r)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
!is.na(sleep.long$ALCOL_week)&!is.na(sleep.long$FUMO..SI.NO.)&
!is.na(sleep.long$CAFFE_week),
c("ID","day","WASO","week.time","SEX","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.","CAFFE_week")]
m <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),data2check)
# computing Cook's dinstance and cutoff
cd <- cooks.distance(m)
data2check$cd <- cd
cutoff <- 4/length(levels(data2check$ID)) # 4/N
# selecting influential cases
index <- c(1:nrow(data2check)) # nrow
pm=cbind(data2check,cd,index)
pm=pm[order(pm$cd,decreasing = TRUE),] # order
pm$infl <- NA
for(i in 1:nrow(pm)){
if(pm[i,"cd"] >= cutoff){ pm[i,"infl"] <- 1 } else { pm[i,"infl"] <- 0 }
}
PM <- pm
# Plotting mahalanobis distance
par(mfrow=c(1,1),mar=c(6, 5, 4, 2) + 0.1)
plot(cd, xlab="Index",ylab="Cook's distance",
main=paste("Cook's Distance \ncut-off: 4/N =",round(cutoff,3),
" ( NInfl =",nrow(pm[pm$infl==1,]),")"),
pch=20,cex.lab=1.6,cex.axis=1.4,col.main="blue")
points(pm[pm$cd>=cutoff,"index"],
pm[pm$cd>=cutoff,"cd"],pch=20,cex=1.3,col="red")
text(pm[pm$cd>=cutoff,"index"]-10,
pm[pm$cd>=cutoff,"cd"]-0.005,
paste(pm[pm$cd>=cutoff,"ID"],pm[pm$cd>=cutoff,"day"]),cex=.8)
abline(h=cutoff,
lwd=1.8,col="blue",lty=2)
# participants showing cases above the cutoff
levels(as.factor(as.character(pm[pm$cd>cutoff,"ID"])))
## [1] "EM04" "EM07" "EM09" "EM10" "EM12" "EM17" "EM22" "EM29" "EM38" "EM39"
## [11] "EM50" "EM53" "EM54" "EM55" "MZ03" "MZ16" "MZ17" "MZ25" "MZ30" "MZ34"
## [21] "MZ36" "MZ42" "MZ46" "MZ49" "MZ53" "MZ56"
Comments:
In the next step, we re-estimate the parameters by excluding each potentially influential participant, one-by-one. We focus on substantial changes in the estimated parameters, such that the excluded cases implies a drastic change in the interpretation of the effects.
# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(PM[PM$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
new.data <- PM[PM$ID!=ID,]
m4 <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
new.data <- as.data.frame(round(summary(m4)$coefficients,3))
new.data$ID <- ID
parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
# week.time (the code for the remaining parameters is not showed)
p1 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,est))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="week.timewe" & parameters$est>=1.5,],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,t))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="week.timewe" & parameters$est>=1.5,],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)
Comments:
Larger differences are shown by the parameter estimated for smoke (ranging from 7 to 13, with t-values ranging from 1.4 to 2.2), whose effect is strongly influenced by MZ53 (leading to an increase of 4 in the estimated smoke coefficient).
MZ53 is the participant influencing more coefficients (week.time, sex, MEQ-r, PSQI, alcohol, and smoke), followed by EM54 (affecting the parameter for alcohol), MZ16 (influencing the effect of coffee) and MZ25 (influencing the effect of BDI-II). MZ53 is the most influential case (this participant is also one of the outliers for SE).
Thus, we replicate the influential analysis by removing MZ53.
OUT <- data2check[data2check$ID!="MZ53",] # excluding influential case
m <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),OUT)
# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(OUT[OUT$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
new.data <- OUT[OUT$ID!=ID,]
m4 <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
new.data <- as.data.frame(round(summary(m4)$coefficients,3))
new.data$ID <- ID
parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
Comments:
The exclusion of MZ53 decreases and makes more consistent the parameter estimated for smoke. Slight changes are observable also for other parameters.
There are still some participants showing a degree of influence on the estimation of the parameters, especially MZ25 (affecting sex and PSQI), MZ25 (affecting PSQI, BDI-II, and smoke), and MZ36 (affecting sex, alcohol, and smoke). However, the exclusion of these participant does not substantially change the estimated parameters.
Conclusions:
We decide to exclude only **MZ53* from WASo analyses.
# taking only ID and variables of interest (only continuous), specifying the model
data2check <- sleep.long[!is.na(sleep.long$BT)&!is.na(sleep.long$week.time)&!is.na(sleep.long$SEX)&
!is.na(sleep.long$MEQ.r)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
!is.na(sleep.long$ALCOL_week)&!is.na(sleep.long$FUMO..SI.NO.)&
!is.na(sleep.long$CAFFE_week),
c("ID","day","BT","week.time","SEX","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.","CAFFE_week")]
m <- lmer(BT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),data2check)
# computing Cook's dinstance and cutoff
cd <- cooks.distance(m)
data2check$cd <- cd
cutoff <- 4/length(levels(data2check$ID)) # 4/N
# selecting influential cases
index <- c(1:nrow(data2check)) # nrow
pm=cbind(data2check,cd,index)
pm=pm[order(pm$cd,decreasing = TRUE),] # order
pm$infl <- NA
for(i in 1:nrow(pm)){
if(pm[i,"cd"] >= cutoff){ pm[i,"infl"] <- 1 } else { pm[i,"infl"] <- 0 }
}
PM <- pm
# Plotting mahalanobis distance
par(mfrow=c(1,1),mar=c(6, 5, 4, 2) + 0.1)
plot(cd, xlab="Index",ylab="Cook's distance",
main=paste("Cook's Distance \ncut-off: 4/N =",round(cutoff,3),
" ( NInfl =",nrow(pm[pm$infl==1,]),")"),
pch=20,cex.lab=1.6,cex.axis=1.4,col.main="blue")
points(pm[pm$cd>=cutoff,"index"],
pm[pm$cd>=cutoff,"cd"],pch=20,cex=1.3,col="red")
text(pm[pm$cd>=cutoff,"index"]-10,
pm[pm$cd>=cutoff,"cd"]-0.005,
paste(pm[pm$cd>=cutoff,"ID"],pm[pm$cd>=cutoff,"day"]),cex=.8)
abline(h=cutoff,
lwd=1.8,col="blue",lty=2)
# participants showing cases above the cutoff
levels(as.factor(as.character(pm[pm$cd>cutoff,"ID"])))
## [1] "EM07" "EM08" "EM09" "EM22" "EM23" "EM29" "EM36" "EM40" "EM42" "EM45"
## [11] "MZ03" "MZ04" "MZ17" "MZ22" "MZ25" "MZ34" "MZ40" "MZ50" "MZ53" "MZ54"
## [21] "MZ55" "MZ56"
Comments:
In the next step, we re-estimate the parameters by excluding each potentially influential participant, one-by-one. We focus on substantial changes in the estimated parameters, such that the excluded cases implies a drastic change in the interpretation of the effects.
# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(data2check[data2check$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
new.data <- PM[PM$ID!=ID,]
m4 <- lmer(BT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
new.data <- as.data.frame(round(summary(m4)$coefficients,3))
new.data$ID <- ID
parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
# week.time (the code for the remaining parameters is not showed)
p1 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,est))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")+
theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,t))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")+
theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)
Comments:
Two participants show stronger influence on several parameters: EM36 for week.time, sex, MEQ-r, PSQI, and smoke; MZ22 for EQr, alcohol and coffee. Alcohol is the mostly influenced parameter, whose magitude is decreased .072 (t = 1.89) to .047 (t = 1.22).
However, the exclusion of these participants does not change the interpretation of the results (i.e., non-significant effects remain non-significant, positive effects remain positive and negative effects remain negative).
Conclusions:
We decide to keep all participants in BT data analysis.
# taking only ID and variables of interest (only continuous), specifying the model
data2check <- sleep.long[!is.na(sleep.long$WT.dayAfter)&!is.na(sleep.long$week.time)&!is.na(sleep.long$SEX)&
!is.na(sleep.long$MEQ.r)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
!is.na(sleep.long$ALCOL_week)&!is.na(sleep.long$FUMO..SI.NO.)&
!is.na(sleep.long$CAFFE_week),
c("ID","day","WT.dayAfter","week.time","SEX","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.","CAFFE_week")]
m <- lmer(WT.dayAfter~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),data2check)
# computing Cook's dinstance and cutoff
cd <- cooks.distance(m)
data2check$cd <- cd
cutoff <- 4/length(levels(data2check$ID)) # 4/N
# selecting influential cases
index <- c(1:nrow(data2check)) # nrow
pm=cbind(data2check,cd,index)
pm=pm[order(pm$cd,decreasing = TRUE),] # order
pm$infl <- NA
for(i in 1:nrow(pm)){
if(pm[i,"cd"] >= cutoff){ pm[i,"infl"] <- 1 } else { pm[i,"infl"] <- 0 }
}
PM <- pm
# Plotting mahalanobis distance
par(mfrow=c(1,1),mar=c(6, 5, 4, 2) + 0.1)
plot(cd, xlab="Index",ylab="Cook's distance",
main=paste("Cook's Distance \ncut-off: 4/N =",round(cutoff,3),
" ( NInfl =",nrow(pm[pm$infl==1,]),")"),
pch=20,cex.lab=1.6,cex.axis=1.4,col.main="blue")
points(pm[pm$cd>=cutoff,"index"],
pm[pm$cd>=cutoff,"cd"],pch=20,cex=1.3,col="red")
text(pm[pm$cd>=cutoff,"index"]-10,
pm[pm$cd>=cutoff,"cd"]-0.005,
paste(pm[pm$cd>=cutoff,"ID"],pm[pm$cd>=cutoff,"day"]),cex=.8)
abline(h=cutoff,
lwd=1.8,col="blue",lty=2)
# participants showing cases above the cutoff
levels(as.factor(as.character(pm[pm$cd>cutoff,"ID"])))
## [1] "EM01" "EM07" "EM08" "EM09" "EM17" "EM22" "EM30" "EM36" "EM39" "EM47"
## [11] "em51" "MZ03" "MZ04" "MZ08" "MZ17" "MZ22" "MZ25" "MZ39" "MZ40" "MZ49"
## [21] "MZ54" "MZ56"
Comments:
In the next step, we re-estimate the parameters by excluding each potentially influential participant, one-by-one. We focus on substantial changes in the estimated parameters, such that the excluded cases implies a drastic change in the interpretation of the effects.
# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(data2check[data2check$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
new.data <- PM[PM$ID!=ID,]
m4 <- lmer(WT.dayAfter~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
new.data <- as.data.frame(round(summary(m4)$coefficients,3))
new.data$ID <- ID
parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
# week.time (the code for the remaining parameters is not showed)
p1 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,est))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")+
theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,t))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")+
theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)
Comments:
None of the estimated parameters seems to be substantially affected by potentially influential cases.
MZ22 is the participant influencing more parameters, and particoularly MEQ-r, alcohol, smoke, and coffee comsumption. However, the exclusion of these participants does not change the interpretation of the results (i.e., non-significant effects remain non-significant, positive effects remain positive and negative effects remain negative). The strongest influence of this participant is on alcohol assumption, whose estimated parameter changes from .05 (t = 1.82) to .03 (t = 1.08) after the removal of MZ22.
infl <- influence.ME::influence(m,"ID")
cd <- influence.ME::cooks.distance.estex(infl, parameter=8, sort=TRUE)[,1]
cd[which(names(cd)=="MZ22")]
## MZ22
## 0.5399679
4/nlevels(data2check$ID) # cutoff
## [1] 0.04878049
Conclusions:
We decide to keep all participants in WT data analysis.
In this section, we use the information from previous sections to specify a set of LMER models for each sleep metric of interest. As mentioned above, we proceed with a hierarchical regression by including the predictors of interest one-by-one, and we perform a model comparison starting from a null model (with no predictors) until the model including all predictors. Each model is compared to the previous one using the likelihood ratio test (LRT) and the Aikake Information Criterion (AIC).
library(lme4); library(effects); library(MuMIn); library(arm) # modeling
library(gridExtra); library(jtools); library(knitr); library(sjPlot) # outputs
options(knitr.kable.NA = '')
For each sleep measure:
We specify the models including the predictors in the order mentioned above (see Aims and Content).
The set of models is compared using the LRT and the AIC. The AIC weight (AICw) is used to express AIC in a probabilistic way (i.e., each model is associated with a probability, from 0 to 1, to be the most plausible model).
The parameters estimated by the selected model are reported with their 95% confidence intervals (CI) obtained using bootstrap with the percentile method (with 10,000 replicates), their standard error (SE), and t-value.
The function modelComp is used to optimize the process and generate the output.
modComp <- function(data=sleep.long,measure="TIB",influentials=c("MZ56","EM42"),
CI.method="boot",boot.type="perc",CI.level=.95,nsim=10000,digits=2){
require(lme4); require(MuMIn)
# excluding influential cases
df <- data
for(i in 1:length(influentials)){ df <- df[df$ID!=influentials[i],] }
# model specification
models <- list(lmer(formula=gsub("TIB",measure,
"TIB~(1|ID)"), df), # null model
lmer(formula=gsub("TIB",measure,
"TIB~week.time+(1|ID)"),df), # week time
lmer(formula=gsub("TIB",measure,
"TIB~week.time+SEX+(1|ID)"), df), # SEX
lmer(formula=gsub("TIB",measure,
"TIB~week.time+SEX+MEQ.r+(1|ID)"), df), # MEQ.r
lmer(formula=gsub("TIB",measure,
"TIB~week.time+SEX+MEQ.r+PSQI+(1|ID)"), df), # PSQI
lmer(formula=gsub("TIB",measure,
"TIB~week.time+SEX+MEQ.r+PSQI+BDI.II+(1|ID)"), df), # BDI
lmer(formula=gsub("TIB",measure,
"TIB~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+(1|ID)"), df), # NAP
lmer(formula=gsub("TIB",measure,
"TIB~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID)"), df)) # sub
# likelihood ratio test
lrt <- as.data.frame(anova(models[[1]],models[[2]],models[[3]],models[[4]],models[[5]],
models[[6]],models[[7]],models[[8]]))
mc <- cbind(model=c("null","week.time","SEX","MEQ.r","PSQI","BDI.II","NAP","sub"),
round(lrt[,c(1,2,5:7)],digits),p=round(lrt$`Pr(>Chisq)`,digits+1))
# AIC weight
mc$AICw <- round(Weights(mc$AIC),digits)
# model selection (based on LRT and AIC)
bestMod <- mc[which(mc$AICw==max(mc$AICw)),"model"]
cat("selected model:",as.character(bestMod),"\n\n")
print(summary(models[[which(mc$model==bestMod)]]))
# selected model's parameters
coeff <- round(summary(models[[which(mc$model==bestMod)]])$coefficients,digits)
# boostrap CI
coeff <- cbind(coeff,round(confint.merMod(models[[which(mc$model==bestMod)]],
parm=3:(length(fixef(models[[which(mc$model==bestMod)]]))+2),
level=CI.level,method=CI.method,boot.type=boot.type,nsim=nsim),digits))
# joining model comparison and coefficients info
if(nrow(coeff) < nrow(mc)){
empty.row <- rep("",nrow(mc)-nrow(coeff))
out <- cbind(mc[,c("model","Chisq","Chi Df","p","AICw")],
rbind(coeff[,c(1:2,4:5,3)],
cbind(Estimate=empty.row,'Std. Error'=empty.row,
'2.5 %'=empty.row,'97.5 %'=empty.row,'t value'=empty.row)))
} else if(nrow(coeff) > nrow(mc)){
empty.row <- rep("",nrow(coeff)-nrow(mc))
out <- cbind(rbind(mc[,c("model","Chisq","Chi Df","p","AICw")],
cbind(model=empty.row,Chisq=empty.row,'Chi Df'=empty.row,
p=empty.row,AICw=empty.row)),
coeff[,c(1:2,4:5,3)]) }
rownames(out) <- 1:nrow(out)
out[,c("Estimate","Std. Error","2.5 %","97.5 %","t value")] <- lapply(out[,c("Estimate","Std. Error","2.5 %","97.5 %","t value")],
as.character)
out[,c("Estimate","Std. Error","2.5 %","97.5 %","t value")] <- lapply(out[,c("Estimate","Std. Error","2.5 %","97.5 %","t value")],
as.numeric)
return(out)
}
TIB is modeled assuming a normal distribution for residuals and excluding participant MZ56 and EM42 (see previous sections).
(TIB <- modComp(data=sleep.long,measure="TIB",influentials=c("MZ56","EM42"),
CI.method="boot",boot.type="perc",CI.level=.95,nsim=10000,digits=2))
## selected model: SEX
##
## Linear mixed model fit by REML ['lmerMod']
## Formula: TIB ~ week.time + SEX + (1 | ID)
## Data: df
##
## REML criterion at convergence: 1885.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.1865 -0.5921 0.0757 0.5594 3.6386
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 0.2214 0.4706
## Residual 1.5526 1.2460
## Number of obs: 557, groups: ID, 80
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 7.7926 0.1080 72.151
## week.timewe 0.1775 0.1167 1.521
## SEXM -0.4755 0.1493 -3.184
##
## Correlation of Fixed Effects:
## (Intr) wk.tmw
## week.timewe -0.309
## SEXM -0.654 -0.003
Comments:
The model including the effect of both Week time and Gender is the only one showing a significantly higher logLikelihood compared to the null model for TIB.
The model including Week time and Gender is also the model with the lowest AIC, followed by the model including MEQ.r (AICw = .21).
TIB is only predicted by participants’ Gender (with males showing shorter sleep time compared to females).
Below, the effects are plotted and a summary table is generated.
# effect plot
plot(effects::allEffects(lmer(TIB~week.time+SEX+(1|ID),sleep.long)))
Let’s see what the results would have been if MZ56 and EM42 were not excluded.
modComp(data=sleep.long,measure="TIB",influentials="none",
CI.method="boot",boot.type="perc",CI.level=.95,nsim=10000,digits=2)
## selected model: SEX
##
## Linear mixed model fit by REML ['lmerMod']
## Formula: TIB ~ week.time + SEX + (1 | ID)
## Data: df
##
## REML criterion at convergence: 1980.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.4029 -0.5651 0.0680 0.5462 3.5227
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 0.2317 0.4813
## Residual 1.6921 1.3008
## Number of obs: 571, groups: ID, 82
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 7.7155 0.1093 70.606
## week.timewe 0.2392 0.1203 1.988
## SEXM -0.4162 0.1526 -2.727
##
## Correlation of Fixed Effects:
## (Intr) wk.tmw
## week.timewe -0.315
## SEXM -0.644 -0.003
Comments:
Including MZ56 and EM42 would have led to an higher parameter estimated for Week Time, with higher TIB during weekend compared to weekdays (coeff. = .24, SE = .12, t = 1.99), while mantaining the effect of Gender.
Also in this case, the model with both Week time and Gender would have been selected based on AIC.
Conclusions:
TIB is significantly predicted only by participants’ gender, with shorter TIB in males compared to females. The effect of Week time was less evident and due to participant EM42.
TST is modeled assuming a normal distribution for residuals and including all participants (see previous sections).
(TST <- modComp(data=sleep.long,measure="TST",influentials="none",
CI.method="boot",boot.type="perc",CI.level=.95,nsim=10000,digits=2))
## selected model: SEX
##
## Linear mixed model fit by REML ['lmerMod']
## Formula: TST ~ week.time + SEX + (1 | ID)
## Data: df
##
## REML criterion at convergence: 1850.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.3662 -0.5428 0.0680 0.5632 3.8629
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 0.2774 0.5267
## Residual 1.3010 1.1406
## Number of obs: 571, groups: ID, 82
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 6.7188 0.1069 62.822
## week.timewe 0.2305 0.1055 2.184
## SEXM -0.4875 0.1509 -3.229
##
## Correlation of Fixed Effects:
## (Intr) wk.tmw
## week.timewe -0.282
## SEXM -0.651 -0.003
Comments:
The model including the effect of week.time, and that including also sex are the only models associated to a significantly higher logLikelihood compared to the null model for TST.
The model including only sex and week.time is also the model with the lowest AIC, follo. The dAIC suggests that the former is about 2.5 times more plausible than the latter.
TST is predicted by participants’ sex (with males showing shorter sleep duration), and by week time (with weekend being associated to longer sleep duration than weekdays).
Below, the effects are plotted.
# effect plot
plot(effects::allEffects(lmer(TST~week.time+SEX+(1|ID),sleep.long)))
Conclusions:
Total sleep time is only predicted by participants’ gender (with males showing a shorter duration than females) and week time (with longer TST during weekend compared to week days).
SE is modeled assuming a normal distribution for residuals and excluding participant EM07 and MZ53 (see previous sections).
(SE <- modComp(data=sleep.long,measure="SE",influentials=c("EM07","MZ53"),
CI.method="boot",boot.type="perc",CI.level=.95,nsim=10000,digits=2))
## selected model: null
##
## Linear mixed model fit by REML ['lmerMod']
## Formula: SE ~ (1 | ID)
## Data: df
##
## REML criterion at convergence: 3045.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.8514 -0.4831 0.0708 0.5593 3.3593
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 13.596 3.687
## Residual 9.912 3.148
## Number of obs: 557, groups: ID, 80
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 86.8377 0.4333 200.4
Comments:
None of the specified models shows a significantly higher logLikelihood than the null model.
Consistently, the null model shows the lowest AIC (AICw = .37), with model 1 (including only week.time) showing the second lowest AIC (AICw = .25).
Let’s see what the results would have been if EM07 and MZ53 were not excluded.
(SE <- modComp(data=sleep.long,measure="SE",influentials="none",
CI.method="boot",boot.type="perc",CI.level=.95,nsim=10000,digits=2))
## selected model: null
##
## Linear mixed model fit by REML ['lmerMod']
## Formula: SE ~ (1 | ID)
## Data: df
##
## REML criterion at convergence: 3247.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -8.1059 -0.4471 0.0718 0.5300 3.1171
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 21.56 4.644
## Residual 11.91 3.451
## Number of obs: 571, groups: ID, 82
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 86.4205 0.5328 162.2
summary(lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),data=sleep.long))
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## SE ~ week.time + SEX + MEQ.r + PSQI + BDI.II + NAP + ALCOL_week +
## FUMO..SI.NO. + CAFFE_week + (1 | ID)
## Data: sleep.long
##
## REML criterion at convergence: 3238.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -7.9580 -0.4446 0.0905 0.5389 3.0974
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 20.07 4.480
## Residual 11.84 3.441
## Number of obs: 571, groups: ID, 82
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 91.040487 2.621406 34.730
## week.timewe 0.165767 0.318890 0.520
## SEXM -1.786462 1.170683 -1.526
## MEQ.r -0.129762 0.151364 -0.857
## PSQI -0.003708 0.231190 -0.016
## BDI.II -0.044624 0.082999 -0.538
## NAP1 -0.911895 0.454523 -2.006
## ALCOL_week 0.122842 0.177051 0.694
## FUMO..SI.NO.1 -2.611608 1.244778 -2.098
## CAFFE_week -0.117683 0.069559 -1.692
##
## Correlation of Fixed Effects:
## (Intr) wk.tmw SEXM MEQ.r PSQI BDI.II NAP1 ALCOL_ FUMO..
## week.timewe -0.036
## SEXM -0.212 -0.001
## MEQ.r -0.809 0.000 0.028
## PSQI -0.403 -0.001 0.151 0.039
## BDI.II -0.124 0.000 -0.149 0.140 -0.547
## NAP1 -0.019 0.056 0.019 -0.001 0.001 -0.016
## ALCOL_week -0.031 0.000 -0.445 0.026 -0.130 0.185 -0.010
## FUMO..SI.NO -0.168 -0.001 0.044 0.163 0.062 -0.095 -0.018 -0.291
## CAFFE_week -0.042 -0.001 0.197 -0.212 0.028 -0.054 -0.016 -0.332 -0.134
Comments:
Including EM07 and MZ53 would have led the LRT to indicate the most complex model as the best one.
In contrast, the null model is still the model with the highest AICw.
The most complex model shows an effect of NAP and Smoke, with lower SE on days with NAP compared to days without NAP (coef. = -.91, SE = .45, t = -2.01) and on smokers compared to non-smokers (coef. = -2.61, SE = 1.18, t = -2.10).
Conclusions:
None of the included predictors exerted a significant effect on SE. The effect of NAP was due to participant EM07, and the effect of Smoke was due to participant MZ53.
SOL is log-transformed and modeled assuming a normal distribution for residuals. All participants are included in the analysis (see previous sections).
(SOL <- modComp(data=sleep.long,measure="log(SOL+.05)",influentials="none",
CI.method="boot",boot.type="perc",CI.level=.95,nsim=10000,digits=2))
## selected model: sub
##
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## log(SOL + 0.05) ~ week.time + SEX + MEQ.r + PSQI + BDI.II + NAP +
## ALCOL_week + FUMO..SI.NO. + CAFFE_week + (1 | ID)
## Data: df
##
## REML criterion at convergence: 2228.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.0754 -0.3678 0.1508 0.6414 2.1997
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 0.6508 0.8067
## Residual 2.3802 1.5428
## Number of obs: 571, groups: ID, 82
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 1.39856 0.56185 2.489
## week.timewe -0.04922 0.14293 -0.344
## SEXM 0.60455 0.25030 2.415
## MEQ.r -0.04750 0.03233 -1.469
## PSQI -0.03430 0.04937 -0.695
## BDI.II 0.03627 0.01774 2.045
## NAP1 0.23561 0.19593 1.203
## ALCOL_week -0.08856 0.03780 -2.343
## FUMO..SI.NO.1 -0.13342 0.26593 -0.502
## CAFFE_week 0.03043 0.01485 2.049
##
## Correlation of Fixed Effects:
## (Intr) wk.tmw SEXM MEQ.r PSQI BDI.II NAP1 ALCOL_ FUMO..
## week.timewe -0.075
## SEXM -0.211 -0.001
## MEQ.r -0.807 0.000 0.027
## PSQI -0.403 -0.001 0.151 0.040
## BDI.II -0.124 0.001 -0.152 0.140 -0.546
## NAP1 -0.038 0.054 0.039 -0.001 0.003 -0.032
## ALCOL_week -0.031 0.001 -0.446 0.026 -0.130 0.187 -0.020
## FUMO..SI.NO -0.168 -0.002 0.043 0.164 0.063 -0.094 -0.037 -0.290
## CAFFE_week -0.041 -0.002 0.196 -0.212 0.028 -0.053 -0.032 -0.332 -0.133
Comments:
The model including the effect of BDI-II and the most complex model (with substances) are the only models associated with a significantly higher logLikelihood compared to the null model for SE.
Coherently, the last model shows the lowest AIC (AICw = .44), with the model with BDI-II showing the second lowest AIC (AICw = .17).
SOL is mainly predicted by participants’ sex (with longer SOL for males), BDI-II score, and the weekly amount of consumed alcohol and coffee.
Below, the effects are plotted.
# effect plots
plot(effects::allEffects(lmer(log(SOL + 0.05) ~ week.time + SEX + MEQ.r + PSQI + BDI.II + NAP +
ALCOL_week + FUMO..SI.NO. + CAFFE_week + (1 | ID),sleep.long))[c(2,5,7,9)])
Conclusions:
The model comparison suggested a positive relationship between SOL and depressive symptoms, a negative relationship between SOL and the weekly amount of alcoholic units, a positive relationship between SOL and the weekly amount of coffees, and a higher SOL among smokers compared to non-smokers. Moreover, males showed shorted SOL than females.
WASO is modeled assuming a normal distribution for residuals and by excluding participant MZ53 (see previous sections).
(WASO <- modComp(data=sleep.long,measure="WASO",influentials="MZ53",
CI.method="boot",boot.type="perc",CI.level=.95,nsim=10000,digits=2))
## selected model: null
##
## Linear mixed model fit by REML ['lmerMod']
## Formula: WASO ~ (1 | ID)
## Data: df
##
## REML criterion at convergence: 5040.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.4436 -0.5891 -0.0494 0.5612 4.3741
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 285.1 16.88
## Residual 340.9 18.46
## Number of obs: 564, groups: ID, 81
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 51.710 2.031 25.46
Comments:
None of the specified models shows a significantly higher logLikelihood than the null model.
Consistently, the null model shows the lowest AIC (AICw = .46), with model 1 (including only week.time) showing the second lowest AIC (AICw = .32).
Let’s see what would have been the results if MZ53 was not excluded.
modComp(data=sleep.long,measure="WASO",influentials="none",
CI.method="boot",boot.type="perc",CI.level=.95,nsim=10000,digits=2)
## selected model: null
##
## Linear mixed model fit by REML ['lmerMod']
## Formula: WASO ~ (1 | ID)
## Data: df
##
## REML criterion at convergence: 5146.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.4217 -0.5571 -0.0399 0.5367 4.3105
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 392.8 19.82
## Residual 355.1 18.85
## Number of obs: 571, groups: ID, 82
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 52.885 2.327 22.73
Comments:
Conclusions:
None of the included predictors exerted a significant effect on WASO. The effect of Smoke was not substantial, and mainly due to participant MZ53.
BT is modeled assuming a normal distribution for residuals and including all participants (see previous sections).
(BT <- modComp(data=sleep.long,measure="BT",influentials="none",
CI.method="boot",boot.type="perc",CI.level=.95,nsim=10000,digits=2))
## selected model: MEQ.r
##
## Linear mixed model fit by REML ['lmerMod']
## Formula: BT ~ week.time + SEX + MEQ.r + (1 | ID)
## Data: df
##
## REML criterion at convergence: 2062
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.3704 -0.5531 -0.0790 0.4277 7.4683
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 0.8255 0.9086
## Residual 1.7353 1.3173
## Number of obs: 571, groups: ID, 82
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 26.77508 0.45815 58.442
## week.timewe 0.68769 0.12187 5.643
## SEXM 0.42605 0.23026 1.850
## MEQ.r -0.14951 0.03185 -4.694
##
## Correlation of Fixed Effects:
## (Intr) wk.tmw SEXM
## week.timewe -0.076
## SEXM -0.300 -0.002
## MEQ.r -0.937 0.000 0.073
# summary of the most complex model
summary(lmer(BT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),data=sleep.long))
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## BT ~ week.time + SEX + MEQ.r + PSQI + BDI.II + NAP + ALCOL_week +
## FUMO..SI.NO. + CAFFE_week + (1 | ID)
## Data: sleep.long
##
## REML criterion at convergence: 2076.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.3333 -0.5553 -0.0620 0.4410 7.4006
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 0.7522 0.8673
## Residual 1.7383 1.3185
## Number of obs: 571, groups: ID, 82
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 26.358165 0.563701 46.759
## week.timewe 0.688531 0.122158 5.636
## SEXM 0.182421 0.251344 0.726
## MEQ.r -0.143271 0.032475 -4.412
## PSQI 0.002539 0.049594 0.051
## BDI.II 0.001892 0.017814 0.106
## NAP1 0.010741 0.169745 0.063
## ALCOL_week 0.071955 0.037978 1.895
## FUMO..SI.NO.1 0.421426 0.267109 1.578
## CAFFE_week 0.002939 0.014918 0.197
##
## Correlation of Fixed Effects:
## (Intr) wk.tmw SEXM MEQ.r PSQI BDI.II NAP1 ALCOL_ FUMO..
## week.timewe -0.064
## SEXM -0.211 -0.001
## MEQ.r -0.808 0.000 0.027
## PSQI -0.403 -0.001 0.151 0.040
## BDI.II -0.124 0.001 -0.151 0.140 -0.547
## NAP1 -0.033 0.055 0.034 -0.001 0.002 -0.028
## ALCOL_week -0.031 0.001 -0.445 0.026 -0.130 0.186 -0.017
## FUMO..SI.NO -0.168 -0.002 0.043 0.164 0.063 -0.094 -0.032 -0.290
## CAFFE_week -0.041 -0.002 0.196 -0.212 0.028 -0.054 -0.028 -0.332 -0.133
Comments:
The model including the effect of week.time shows a significantly higher logLikelihood compared to the null model for BT. Moreover, the LRT suggests a significant contribution of MEQ-r score over week time and sex, and a significant contribution of the inclusion of substances over the other predictors.
In contrast, the model with MEQ-r is selected as the most plausible model based on the AIC (AICw = .44), with the most complex model showing the second best AIC (AICw = .31).
In the selected model, BT is predicted by the week time, with weekend days showing later BT compared to week days. Moreover, BT was predicted by the MEQ-r score, with earlier BT being predicted by higher scores (i.e., indicating morning circadian preferences). The effect of sex is not substantial.
The most complex model (selected based on the LRT but not the AIC), does not show any substantial effect of substances.
Below, the effects are plotted.
plot(effects::allEffects(lmer(BT ~ week.time + SEX + MEQ.r + (1|ID),sleep.long))[c(1,3)])
Conclusions:
The model’s comparison suggested a negative relationship between BT and the score on the MEQ-r, and later BT during the weekend compared to weekdays.
WT is modeled assuming a normal distribution for residuals and including all participants (see previous sections). Note that for testing the effects of week.time on this variable, we should use WT.dayAfter, which is referred to the subsequent day (e.g., WT on day 7 is Monday WT). In this way, Monday to Friday are considered weekdays, whereas Saturday and Sunday are considered weekend days.
(WT <- modComp(data=sleep.long,measure="WT.dayAfter",influentials="none",
CI.method="boot",boot.type="perc",CI.level=.95,nsim=10000,digits=2))
## selected model: MEQ.r
##
## Linear mixed model fit by REML ['lmerMod']
## Formula: WT.dayAfter ~ week.time + SEX + MEQ.r + (1 | ID)
## Data: df
##
## REML criterion at convergence: 1891.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.9263 -0.6002 -0.0830 0.5683 4.2610
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 0.4798 0.6927
## Residual 1.3376 1.1566
## Number of obs: 569, groups: ID, 82
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 10.462207 0.362748 28.842
## week.timewe 0.890752 0.107298 8.302
## SEXM -0.002575 0.182162 -0.014
## MEQ.r -0.146897 0.025204 -5.828
##
## Correlation of Fixed Effects:
## (Intr) wk.tmw SEXM
## week.timewe -0.081
## SEXM -0.300 -0.003
## MEQ.r -0.936 -0.003 0.074
# summary of the most complex model
summary(lmer(WT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),data=sleep.long))
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## WT ~ week.time + SEX + MEQ.r + PSQI + BDI.II + NAP + ALCOL_week +
## FUMO..SI.NO. + CAFFE_week + (1 | ID)
## Data: sleep.long
##
## REML criterion at convergence: 1967
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.8816 -0.5948 -0.0227 0.5381 3.7976
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 0.4098 0.6402
## Residual 1.5125 1.2298
## Number of obs: 569, groups: ID, 82
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 10.430786 0.446613 23.355
## week.timewe 0.252725 0.114017 2.217
## SEXM -0.178386 0.199107 -0.896
## MEQ.r -0.143042 0.025708 -5.564
## PSQI 0.005645 0.039253 0.144
## BDI.II -0.008849 0.014102 -0.628
## NAP1 0.099003 0.156954 0.631
## ALCOL_week 0.053742 0.030052 1.788
## FUMO..SI.NO.1 0.371894 0.211621 1.757
## CAFFE_week -0.001581 0.011803 -0.134
##
## Correlation of Fixed Effects:
## (Intr) wk.tmw SEXM MEQ.r PSQI BDI.II NAP1 ALCOL_ FUMO..
## week.timewe -0.075
## SEXM -0.210 0.001
## MEQ.r -0.807 0.000 0.027
## PSQI -0.403 -0.002 0.150 0.040
## BDI.II -0.124 0.001 -0.152 0.140 -0.545
## NAP1 -0.038 0.052 0.037 -0.004 0.005 -0.030
## ALCOL_week -0.031 0.000 -0.446 0.026 -0.129 0.187 -0.018
## FUMO..SI.NO -0.169 -0.004 0.041 0.164 0.064 -0.093 -0.036 -0.289
## CAFFE_week -0.041 -0.002 0.196 -0.211 0.027 -0.054 -0.033 -0.332 -0.133
Comments:
The model including the effect of week.time shows a significantly higher logLikelihood compared to the null model for BT. Moreover, the LRT suggests a significant contribution of the MEQ-r score over week time and sex, and a significant contribution of substances’ effect over the other predictors.
In contrast, the model with MEQ-r is selected as the most plausible model based on the AIC (AICw = .40), with the most complex model showing the second best AIC (AICw = .35).
In the selected model, WT is predicted by the week time, with weekend days showing later BT compared to week days. Moreover, BT is predicted by the MEQ-r score, with earlier BT being predicted by higher scores (i.e., indicating morning circadian preferences).
The most complex model (selected based on the LRT but not the AIC), does not show any substantial effect of substances.
Below, the effects are plotted.
plot(effects::allEffects(lmer(WT ~ week.time + SEX + MEQ.r + (1|ID),sleep.long))[c(1,3)])
Conclusions:
The model comparison suggested a negative relationship between WT and the score on the MEQ-r, and later WT during the weekend compared to weekdays.
Here, the outputs of the model comparisons are combined to generate the results table.
(results <- rbind(TIB,TST,SE,SOL,WASO,BT,WT))
write.csv(results,"ModelComp.csv")